Kpax3: Bayesian bi-clustering of large sequence datasets
نویسندگان
چکیده
منابع مشابه
Kpax3: Bayesian bi-clustering of large sequence datasets.
Motivation Estimation of the hidden population structure is an important step in many genetic studies. Often the aim is also to identify which sequence locations are the most discriminative between groups of samples for a given data partition. Automated discovery of interesting patterns that are present in the data can help to generate new biological hypotheses. Results We introduce Kpax3, a ...
متن کاملClustering Large Symbolic Datasets
Clustering is the process of partitioning a set of labeled/unlabeled patterns into meaningful groups so that patterns in each group/cluster are similar to each other in some sense and patterns in different clusters are dissimilar in a corresponding sense. A major outcome of clustering process is an abstraction in the form of description of the clusters; this abstraction can be useful in several...
متن کاملBayesian correlated clustering to integrate multiple datasets
MOTIVATION The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct-but often complementary-information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integra...
متن کاملClustering of large time series datasets
Time series clustering is a very effective approach in discovering valuable information in various systems such as finance, embedded bio-sensor and genome. However, focusing on the efficiency and scalability of these algorithms to deal with time series data has come at the expense of losing the usability and effectiveness of clustering. In this paper a new multi-step approach is proposed to imp...
متن کاملClustering Large Datasets of Mixed Units
In the paper we propose an approach for clustering large datasets of mixed units based on representation of clusters by distributions of values of variables over a cluster – histograms, that are compatible with merging of clusters. The proposed representation can be used also for clustering symbolic data. On the basis of this representation the adapted versions of leaders method and adding meth...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bioinformatics
سال: 2018
ISSN: 1367-4803,1460-2059
DOI: 10.1093/bioinformatics/bty056